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METHOD FOR COMPUTING MODELS BASED ON 
ATTRIBUTES SELECTED BY ENTROPY 

RELATED APPLICATIONS 

The present invention is related to the subject matter 
of the following commonly assigned, copending United States 

patent applications: serial no. 0 9/ (Docket No. AT9- 

99-038) entitled "COMPILING MESSAGE CATALOGS FOR CLIENT-SIDE 

CONSUMPTION" and filed , 1999; serial no. 09/ 

(Docket No. AT9-99-039) entitled "DYNAMIC SCREEN CONTROL" 

and filed , 1999; and serial no. 09/ (Docket 

No. AT9-99-040) entitled "METHOD OF CUSTOMIZING SCREEN AND 
APPLICATION BEHAVIOR USING ATTRIBUTE METADATA IN AN APPLIC- 
ATION DATABASE" and filed , 1999. The content of 

the above-referenced applications is incorporated herein by 
reference , 

BACKGROUND OF THE INVENTION 
1* Technical Field: 

The present invention relates in general to data 
analysis and in particular to qualifying sample populations 
employed in predictive data analysis. Still more 
particularly, the present invention relates to reducing the 
number of attributes of a sample population employed in 
generating a predictive model based on the sample 
population. 
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2. Description of the Related Art: 

A wide array of subjects are the focus of contemporary 
data collection, such as customer data which is collected by 
5 various industries, medical information which is collected 

for development of diagnostics and treatment protocols, or 
data relating to insurable or potentially insurable 
activities which is collected for insurance risk assessment. 
Collection of such data has become so routine and pervasive 
10 that "data mining" is frequently required to separate useful 

information from dross. 

Data collection is frequently undertaken for the 
m purposes of developing predictive models. That is, by 

1^ statistical analysis of characteristics of a sample 

population, attempts are made to derive models which may 
M predict, with reasonable accuracy, whether an individual 

subject will exhibit a characteristic or group of character- 
q: istics of interest based on known characteristics of that 

2fyz subject. For instance, marketing firms may attempt to 

1^^. develop predictive models for determining which individuals 

within a target population are most likely to respond to a 
particular promotional campaign. 

25 Contemporary data collection generally proceeds more or 

less indiscriminately. That is, those engaged in collection 
of data typically collect as much data regarding each 
individual subject as possible, without regard to the 
ultimate usefulness of the data in, for example, developing 

30 a predictive model. This may result from uncertainty 

regarding which characteristics are most useful for a 
particular purpose and/or the simplistic conviction that 
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more data will produce better results. More frequently, 
however, indiscriminate data collection results instead from 
the use of a data set for more than one purpose, spreading 
the cost of the data collection among multiple projects, 

5 

One effect of indiscriminate data collection on the 
development of predictive models is the inefficiency and 
error introduced by large data samples, A sample population 
may include data for five hundred or more characteristics of 
10 each individual subject within the sample population. 

Attempting to generate a predictive model based on that many 
individual characteristics is computationally inefficient* 
O Furthermore, as the number of characteristics or attributes 

employed in generating the predictive model increases, the 
Igi probability that the sample population is skewed by one or 

i^^; more characteristics or attributes also increases. 

It would be desirable, therefore, to provide a 
:L mechanism for preprocessing a sample population to reduce 

the number of attributes or characteristics employed in 
generating a predictive model. It would further be 
jj. advantageous if the mechanism eliminated characteristics 

^-P which might skew the sample population and thereby degrade 

the accuracy of a predictive model generated from such data. 

25 
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SUMMARY OF THE INVENTION 

It is therefore one object of the present invention to 
provide improved data analysis. 

It is another object of the present invention to 
provide improved qualification of sample populations 
employed in predictive data analysis. 

It is yet another object of the present invention to 
provide a technique for reducing the number of attributes of 
a sample population employed in generating a predictive 
model based on the sample population. 

The foregoing objects are achieved as is now described. 
Attributes of a data set to be employed in generating a 
predictive model are analyzed based on entropy, chi-square, 
or similar statistical measure. A target group of samples 
exhibiting one or more desired attributes is identified, 
then remaining attribute values for the target group are 
compared to corresponding attribute values for the whole 
sample population. A subset of all available attributes is 
then selected from those attributes which exhibit, when 
comparing attribute values of target group samples to 
attribute values for the whole sample population, the 
greatest relative difference or divergence. That is, an 
attribute for which the target group samples exhibit, for 
example, only two of all possible values is selected in 
preference to an attribute for which the target group 
samples exhibit three or more of the possible values. This 
subset is employed to generate the predictive model. 
Efficiency in generating the predictive model is improved. 
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since fewer attributes are employed and less computational 
resources are required. Accuracy of the resulting 
predictive model is also improved since attributes 
potentially skewing the sample population in a manner least 
related to the desired attribute are eliminated from 
consideration in developing the model. 

The above as well as additional objects, features, and 
advantages of the present invention will become apparent in 
the following detailed written description. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The novel features believed characteristic of the 
invention are set forth in the appended claims. The 
invention itself however, as well as a preferred mode of 
use, further objects and advantages thereof, will best be 
understood by reference to the following detailed 
description of an illustrative embodiment when read in 
conjunction with the accompanying drawings, wherein: 

Figure 1 depicts a block diagram of a data processing 
system in which a preferred embodiment of the present 
invention may be implemented; 

Figure 2 is a logical block diagram for a mechanism for 
preprocessing a sample population to be employed in 
generating a predictive model in accordance with a preferred 
embodiment of the present invention; and 

Figure 3 depicts a high level flow chart for a process 
of selecting attributes of a sample for generating a 
predictive model in accordance with a preferred embodiment 
of the present invention. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

With reference now to the figures, and in particular 
with reference to Figure 1, a block diagram of a data 
processing system in which a preferred embodiment of the 
present invention may be implemented is depicted. Data 
processing system 100 may be, for example, one of the models 
of personal computers available from International Business 
Machines Corporation of Armonk, New York. Data processing 
system 100 includes a processor 102, which in the exemplary 
embodiment is connected to a level two (L2) cache 104, 
connected in turn to a system bus 106. In the exemplary 
embodiment, data processing system 100 includes graphics 
adapter 116 also connected to system bus 106, receiving user 
interface information for display 120, 

Also connected to system bus 106 is system memory 108 
and input/output (I/O) bus bridge 110. I/O bus bridge 110 
couples I/O bus 112 to system bus 106, relaying and/or 
transforming data transactions from one bus to the other. 
Peripheral devices such as nonvolatile storage 114, which 
may be a hard disk drive, and keyboard/pointing device 116, 
which may include a conventional mouse, a trackball, or the 
like, are connected to I/O bus 112, 

The exemplary embodiment shown in Figure 1 is provided 
solely for the purposes of explaining the invention and 
those skilled in the art will recognize that numerous 
variations are possible, both in form and function. For 
instance, data processing system 100 might also include a 
compact disk read-only memory (CD-ROM) or digital video disk 
(DVD) drive, a sound card and audio speakers, and numerous 
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other optional components. All such variations are believed 
to be within the spirit and scope of the present invention. 
However, data processing system 100 is preferably programmed 
to provide a mechanism for preprocessing a sample population 
to be employed in generating a predictive model by reducing 
the number of attributes of the sample population which are 
utilized in generating the predictive model. 

Referring to Figure 2, a logical block diagram for a 
mechanism for preprocessing a sample population to be 
employed in generating a predictive model in accordance with 
a preferred embodiment of the present invention is 
illustrated. The mechanism 2 00 includes a preprocessing 
module 202 which receives a sample 204, Sample 204 is 
preferably a relational database containing a number of data 
elements 206 (rows) . Each data element 206 includes various 
attributes 208 or characteristics (columns) . In the example 
contemplated, sample 204 may contain any arbitrary number of 
data elements, and the number of attributes 208 of each data 
element 206 may equal or exceed 500. 

Preprocessing module 2 02 receives sample 204 in which 
at least some data elements possess a desired attribute or 
group of attributes, which may be referred to as a "target" 
data set. Module 202 then selects other attributes for data 
elements within sample 204 to be used in generating a 
predictive model by statistically analyzes sample 204 to 
determine which attributes of the target data elements 
differ the most from corresponding attributes of the sample 
population as a whole. The attributes for which sample 
instances having a desired characteristic have values which 
are the most different from corresponding attribute values 
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of the sample population generally are selected. 

For the purposes of this description only, and without 
intended to imply any limitations to the present invention, 
sample 204 may be viewed as logically divided within module 
2 02 into a target group 210 and one or more other groups 212 
based on the value of a specific attribute. Data elements 
within sample 204 having the desired attribute value or 
values are categorized in target group 210; data elements 
within sample 204 not having the desired attribute or 
attributes are categorized in one of the other groups 212. 
The other attributes- -those not forming the basis of logical 
division of sample 204--of target group 210 are then 
compared to the attributes of entire sample 204 to determine 
which attributes exhibit the largest difference between 
target group 210 and sample 204. 

For example, suppose an attribute A may have three 
possible values: "Y, " "N, " and "UNKNOWN." If the interest 
lies in building a predictive model to predict when 
attribute A will have the value "Y." Samples having the 
value "Y" for attribute A are therefore categorized as 
target group 210. The remaining attributes B, C and D for 
the samples are then compared, with the attribute values 
within target group 210 being compared to the attribute 
values for entire sample 204. Attributes with the largest 
difference between the target group 210 and the entire 
sample 204 are selected. 

Suppose attribute B, for instance, has five possible 
values within sample 2 04, but target group 210 only includes 
two of those values for attribute B among constituent 
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samples. Similarly attribute C also has five possible 
values, four of which are exhibited by members of target 
group 210, Attribute D has ten different values within 
sample 2 04, but only three of those values are found for 
attribute D within target group 210. In this case, 
attribute B has a larger relative difference between target 
group 204 and sample 204 than attribute C, since it has less 
overlap in attribute values. Attribute D exhibits a greater 
difference or divergence between samples having the desired 
attribute and the whole sample population than either of 
attributes B or C, Thus, attribute D would be selected in 
preference to attributes B or C in generating a predictive 
model, and attribute B would be chosen over attribute C. 

The example described above utilizes four attributes 
and a relatively small number of possible values for each 
attribute. In practice, however, the process described may 
be applied to samples each having 500 or more attributes, 
with as many possible attribute values as there are samples 
for some attributes. Known statistical parameters such as 
entropy : 

n 

Hip^, . . . ,p^) =-^Pi log 

and/or chi-square may be utilized to evaluate the attributes 
to determine which exhibit the greatest difference. 

In the exemplary embodiment contemplated, a predeter- 
mined number of attributes (e.g., ten) which exhibit the 
greatest difference with respect to attribute values, other 
than thsoe for the desired attribute or group of attributes, 
for the target group 210 versus the entire sample 204 is 
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selected. However, any arbitrarily sized subset of 
attributes selected based on entropy may be employed. 
Additionally, the number of attributes selected for 
employment in generating a predictive model need not be a 
fixed size, but may instead be a percentage of the total 
number of attributes, or only those attributes in which the 
values for sample instances having the desired character- 
istic exhibit a threshold amount of difference from the 
corresponding attribute values for the entire sample 
population. 

Once the attributes which are to be employed in 
generating a predictive model have been selected as 
described above, module 202 identifies the selected 
attributes (e.g., by a list of attribute names) for model 
generator 214, Model generator 214 may then generate a 
predictive model based on sample 204 utilizing the selected 
attributes in accordance with techniques known to those 
skilled in the art. 

With reference now to Figure 3, a high level flow chart 
for a process of selecting attributes of a sample for 
generating a predictive model in accordance with a preferred 
embodiment of the present invention is depicted. The 
process begins at step 302, which depicts a model build 
being initiated. A data set, from which a sample population 
may be drawn including at least one sample having a desired 
attribute, should be available for building the desired 
predictive model. If less than the entire data set is 
employed in generating the predictive model, the resulting 
predictive model may then be applied to the remaining 
samples in the data set. The desired attribute (s) for which 
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the predictive model is generated need not have only two 
possible values (e.g., gender), but may be a relative 
measure such as a value exceeding a predetermined threshold 
(e.g., monthly usage of a service). 

The process first passes to step 304, which illustrates 
grouping the elements of the sample population based on the 
values of the attribute (s) to be the subject of prediction, 
identifying a target group of samples. The process then 
passes to step 306, which depicts selecting an attribute and 
determining a relative difference or divergence in the 
attribute values for target group samples versus the whole 
sample population, A relative difference (e.g., ratio or 
percentage) should be determined since comparison of 
absolute differences may not be meaningful. 

The process then passes to step 308, which illustrates 
a determination of whether all attributes available for the 
sample population, other than those for which the predictive 
model is being built, have been considered. If all 
attributes for the sample population have not been 
considered, the process returns to step 306 to select 
another attribute for analysis and repeat the process of 
steps 306 with the newly selected attribute. 

Once all attributes for the sample population have been 
analyzed, the process proceeds from step 308 to step 310, 
which depicts selecting n attributes exhibiting the largest 
relative differences for samples having the desired 
attributes as compared to all samples within the sample 
population. A sort or ranking of the attributes by such 
relative difference may be useful in this step. The number 
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n of attributes selected may be any arbitrarily set number 
or, as described above, may be a predetermined percentage of 
the attributes or attributes exhibiting a relative 
difference between samples which exceeds a predetermined 
threshold. 

The process next passes to step 312, which illustrates 
building a model for the desired attribute and the sample 
population utilizing the selected attributes. Various known 
techniques may be employed for this purpose. The process 
passes then to step 314, which depicts applying the 
predictive model generated to a data set. Finally, the 
process passes to step 316, which illustrates the process 
becoming idle until another model build is undertaken. 

The present invention allows data collections have 
large numbers of potentially irrelevant or meaningless 
attributes for each sample to be employed in building an 
accurate predictive model. Efficiency in generating the 
predictive model is improved by reducing the number of 
attributes which are considered during the model build. 
This requires both less time and less computational 
resources to generate the predictive model. Accuracy of the 
resulting predictive model is also improved. Attributes 
which might skew the sample population but have no relation 
to the desired characteristic- -or less relation to the 
desired attribute than other attributes- -are eliminated from 
consideration in building the predictive model. 

It is important to note that while the present 
invention has been described in the context of a fully 
functional data processing system and/or network, those 
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skilled in the art will appreciate that the mechanism of the 
present invention is capable of being distributed in the 
form of a computer usable medium of instructions in a 
variety of forms, and that the present invention applies 
equally regardless of the particular type of signal bearing 
medium used to actually carry out the distribution. 
Examples of computer usable mediums include: nonvolatile, 
hard-coded type mediums such as read only memories (ROMs) or 
erasable, electrically programmable read only memories 
(EEPROMs) , recordable type mediums such as floppy disks, 
hard disk drives and CD-ROMs, and transmission type mediums 
such as digital and analog communication links. 

While the invention has been particularly shown and 
described with reference to a preferred embodiment, it will 
be understood by those skilled in the art that various 
changes in form and detail may be made therein without 
departing from the spirit and scope of the invention. 
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CLAIMS : 

What is claimed is: 

1. A method of computing a model, comprising: 

comparing attribute values for samples having a desired 
attribute to attribute values for all samples; and 

selecting a subset of available attributes based on a 
difference between attribute values for the samples having 
the desired attribute and attribute values for all of the 
samples . 

2. The method of claim 1, wherein the step of comparing 
attribute values for samples having a desired attribute to 
attribute values for all samples further comprises: 

determining a statistical measure of difference between 
the attribute values for samples having the desired 
attribute and the attribute values for all of the samples. 

3 . The method of claim 2, wherein the step of determining 
a statistical measure of difference between the attribute 
values for samples having the desired attribute and the 
attribute values for all of the samples further comprises: 

determining an entropy for the attribute values. 

4. The method of claim 1, wherein the step of selecting a 
subset of available attributes based on a difference between 
attribute values for the samples having the desired 
attribute and attribute values for all of the samples 
further comprises : 

identifying n attributes having a largest difference in 
attribute values. 
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5. The method of claim 1, wherein the step of selecting a 
subset of available attributes based on a difference between 
attribute values for the samples having the desired 
attribute and attribute values for all of the samples 
further comprises : 

identifying a predetermined percentage of attributes 
having a larger difference in the attribute values than 
remaining attributes. 

6. The method of claim 1, wherein the step of selecting a 
subset of available attributes based on a difference between 
attribute values for the samples having the desired 
attribute and attribute values for all of the samples 
further comprises: 

identifying attributes having a difference in the 
attribute values exceeding a predetermined amount. 

7. The method of claim 1, further comprising: 
obtaining a plurality of samples, each sample having 

values for a plurality of attributes, 

8. The method of claim 1, further comprising: 
employing the selected subset of attributes to generate 

a predictive model. 
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9. A method of computing a model, comprising: 
obtaining a plurality of samples each having values for 

a plurality of attributes; 

comparing attribute values for samples having at least 
one desired attribute to attribute values for all of the 
plurality of samples; 

selecting attributes having a largest difference 
between attribute values for samples having the at least one 
desired attribute and attribute values for all of the 
plurality of samples; and 

computing a model employing the selected attributes, 

10. The method of claim 9, wherein the step of selecting 
attributes having a largest difference between attribute 
values for samples having the at least one desired attribute 
and attribute values for all of the plurality of samples 
further comprises: 

identifying a predetermined number of attributes having 
the largest difference in attribute values. 

11. The method of claim 9, wherein the step of selecting 
attributes having a largest difference between attribute 
values for samples having the at least one desired attribute 
and attribute values for all of the plurality of samples 
further comprises: 

identifying a predetermined percentage of attributes 
having the relative difference in attribute values. 

12. The method of claim 9, wherein the step of selecting 
attributes having a largest difference between attribute 
values for samples having the at least one desired attribute 
and attribute values for all of the plurality of samples 
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further comprises: 

identifying attributes having a difference in attribute 
values equal to or greater than a predetermined amount. 
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1 13 . A method of selecting attributes for computing a model, 

2 comprising : 

3 for a plurality of samples each having values for a 

4 plurality of attributes: 

5 for each of the plurality of attributes: 

6 comparing the attribute values for a first 

7 group of samples to the attribute values for all 

8 of the plurality of samples; and 

9 determining a difference between the 

10 attribute values for the first groups and the 

11 attribute values for all of the plurality of 

12 samples; and 

13;^ identifying attributes within the plurality of 

l4ij attributes having a largest difference between the 

1^1 attribute values for the first groups and the attribute 

1$S; values for all of the plurality of samples; and 

1>'= selecting at least some of the identified attributes. 
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14. A system for selecting attributes for computing a 
model , comprising : 

a memory containing data for a plurality of samples 
each having values for a plurality of attributes; and 

a processor coupled to the memory and executing a 
selection process including: 

comparing attribute values for samples having a 

desired attribute to attribute values for all samples; 
selecting a subset of available attributes based 

on a difference between attribute values for the 

samples having the desired attribute and attribute 

values for all of the samples; and 

employing the selected subset of attributes to generate 
a predictive model. 

15. The system of claim 14, wherein the selection process 
determines a statistical measure of difference between the 
attribute values for samples having the desired attribute 
and the attribute values for all of the samples • 

16. The system of claim 15, wherein the selection process 
determines an entropy for the attribute values. 

17. The system of claim 14, wherein the selection process 
identifies a predetermined number of attributes having a 
largest difference in the attribute values for selection, 

18. The system of claim 14, wherein the selection process 
identifies a predetermined percentage of attributes having a 
larger difference in the attribute values for selection. 

19. The system of claim 14, wherein the selection process 
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identifies, for selection, attributes having a difference 
the attribute values exceeding a predetermined amount. 
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1 2 0. A system for computing a model, comprising: 

2 a memory containing data for a plurality of samples 

3 each having values for a plurality of attributes; and 

4 a processor coupled to the memory and executing a 

5 selection process including: 

6 comparing attribute values for a first subset of 

7 the plurality of samples to attribute values for all of 

8 the samples; 

9 selecting attributes having a largest difference 

10 between attribute values for the first subset and 

11 attribute values for all of the samples; and 

12 computing a model employing the selected 
its attributes . 
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21. A computer program product within a computer usable 
medium for selecting attributes for computing a model, 
comprising : 

instructions for reading values of attributes for a 
plurality of samples; 

instructions for comparing attribute values for samples 
having a desired attribute to attribute values for all 
samples; and 

instructions for selecting a subset of available 
attributes based on a difference between attribute values 
for samples having the desired attribute and attribute 
values for all samples. 

22. The computer program product of claim 21, wherein the 
instructions for comparing attribute values for samples 
having a desired attribute to attribute values for all 
samples further comprise: 

instructions for determining a statistical measure of 
difference between the attribute values for samples having 
the desired attribute and the attribute values for all 
samples . 

23. The computer program product of claim 22, wherein the 
instructions for determining a statistical measure of 
difference between the attribute values for samples having 
the desired attribute and the attribute values for all 
samples further comprise; 

instructions for determining an entropy of the 
attribute values for samples having the desired attribute 
and an entropy of the attribute values for all samples; 

instructions for comparing the entropy of the attribute 
values for samples having the desired attribute to the 
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entropy of the attribute values for all samples for each 
attribute to determine a relative measure of difference; and 

instructions for comparing the relative measure of 
difference of all attributes. 

24, The computer program product of claim 21, wherein the 
instructions for selecting a subset of available attributes 
based on a difference between attribute values for samples 
having the desired attribute and attribute values for all 
samples further comprise: 

instructions for identifying n attributes having a 
largest difference in the attribute values. 
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25. A computer program product within a computer usable 
medium for selecting attributes for computing a model, 
comprising: 

instructions for comparing attribute values for a first 
group of samples to attribute values for all samples for 
each of a plurality of attributes; 

instructions for determining a difference between the 
attribute values for the first group of samples and the 
attribute values for all of the samples; and 

instructions for selecting a group of attributes having 
a largest difference between the attribute values for the 
first group of samples and the attribute values for all 
samples . 
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METHOD FOR COMPUTING MODELS BASED ON 
ATTRIBUTES SELECTED BY ENTROPY 
ABSTRACT OF THE DISCLOSURE 

Attributes of a data set to be employed in generating 
predictive model are analyzed based on entropy, chi-square, 
or similar statistical measure. A target group of samples 
exhibiting one or more desired attributes is identified, 
then remaining attribute values for the target group are 
compared to corresponding attribute values for the whole 
sample population, A subset of all available attributes is 
then selected from those attributes which exhibit, when 
comparing attribute values of target group samples to 
attribute values for the whole sample population, the 
greatest relative difference or divergence. That is, an 
attribute for which the target group samples exhibit, for 
example, only two of all possible values is selected in 
preference to an attribute for which the target group 
samples exhibit three or more of the possible values. This 
subset is employed to generate the predictive model. 
Efficiency in generating the predictive model is improved, 
since fewer attributes are employed and less computational 
resources are required. Accuracy of the resulting 
predictive model is also improved since attributes 
potentially skewing the sample population in a manner least 
related to the desired attribute are eliminated from 
consideration in developing the model. 
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DECLARATION AND POWER OF ATTORNEY FOR 
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PATENT APPLICATION 



As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next 
to my name; 

I believe I am an original, first and joint inventor of the subject matter 
which is claimed and for which a patent is sought on the invention entitled 

METHOD FOR COMPUTING MODELS 
BASED ON ATTRIBUTES SELECTED BY ENTROPY 

the specification of which (check one) 

X is attached hereto. 



was filed on 

as Application Serial No. 

and was amended on 

(if applicable) 

I hereby state that I have reviewed and understand the contents of the above 
identified specification, including the claims, as amended by any amendment 
referred to above. 

I acknowledge the duty to disclose information which is material to the 
patentability of this application in accordance with Title 37, Code of Federal 
Regulations, §1.56. 

I hereby claim foreign priority benefits under Title 35, United States Code, §119 
of any foreign application (s) for patent or inventor's certificate listed below 
and have also identified below any foreign application for patent or inventor's 
certificate having a filing date before that of the application on which 
priority is claimed: 

Prior Foreign Application (s) : Priority Claimed 

Yes No 

(Number) (Country) (Day/Month/ Year) 



I hereby claim the benefit under Title 35, United States Code, §120 of any United 
States application (s) listed below and, insofar as the subject matter of each of 
the claims of this application is not disclosed in the prior United States 
application in the manner provided by the first paragraph of Title 35, United 
States Code, §112, I acknowledge the duty to disclose information material to 
the patentability of this application as defined in Title 37, Code of Federal 
Regulations, §1.56 which occurred between the filing date of the prior 
application and the national or PCT international filing date of this 
application; 
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I hereby declare that all statements made herein of my own knowledge are true and 
that a}^l statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false 
statements and the like so made are pimishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States Code and that such willful 
false statements may jeopardize the validity of the application or any patent 
issued thereon. 

POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorneys 
and/or agents to prosecute this application and transact all business in the 
Patent and Trademark Office connected therewith. 

John W. Henderson, Jr., Reg. No. 26,907; Thomas E. Tyson, Reg. No. 28,543; Robert 
M. Carwell, Reg. No. 28,499; Jeffrey S. LaBaw, Reg. No. 31,633; Douglas H. 
Lefeve, Reg. No. 26,193; Casimer K. Salys, Reg. No. 28,900; David A. Mims , Jr., 
Reg. No. 32,708; Richard A. Henkler, Reg. No. 39,220; Volel Emile, Reg. No. 
39,969; James H. Barksdale, Jr. Reg. No. 24,091; Anthony V. England, Reg, No. 
3 5,129; Leslie A. Van Leeuwen, Reg. No. 42,196; Christopher A. Hughes, Reg. No. 
26,914; Edward A. Pennington, Reg. No. 32,588; John E, Hoel, Reg. No, 26,279; and 
Joseph C. Redmond, Jr., Reg. No. 18,753; Marilyns. Dawkins, Reg. 31,140; Andrew 
J. Dillon, Reg. No. 29,634; Kenneth C, Hill, Reg. No. 29,650; Melvin A. Hunn, 
Reg. No. 32,574; Max Ciccarelli, Reg. No. 39,454; Jack V. Musgrove, Reg. No. 
31,986; Daniel E. Venglarik, Reg. No. 39,409; Brian F. Russell, Reg. No. 40,796; 
Philip T. Virga, Reg. No. 36,710; John G. Graham, Reg, No. 19,563; Matthew W. 
Baca, Reg. No. 42,277; Justin M. Dillon, Reg. No. 42,486; and Antony P. Ng, Reg. 
No. 43,032. 

Send correspondence to: Andrew J. Dillon, FELSMAN, BRADLEY, GUNTER & DILLON, LLP, 
Lakewood on the Park, Suite 350, 7600B North Capital of Texas Highway, Austin, 
Texas 78731, and direct all telephone calls to Andrew J. Dillon, 512/343-6116. 
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As a below named invent<5r, I hereby declare that: 

>fy residence, post office address and citizenship are as stated below next 
to my name; 

I believe I am an original^ first and joint inventor of the subject matter 
which is claimed and for which a patent is sought on the invention entitled 

METHOD FOR COMPUTINtS MODELS 
BASED ON ATTRIBUTES SELECTED BY ENTROPY 

the specification of which (check one) 

X is attached hereto. 

was filed on 

as Application Serial No. 

and was amended on ^ 

(if applicable) 

I hereby state that I have reviewed and \inderstand the contents of the above 
identified specification, including the claims, as amended by any amendment 
referred to above. 

I acknowledge the duty to disclose information which is material to the 
patentability of this application in accordance with Title 37, Code of Federal 
Regulations, §1 . 56 . 

I hereby claim foreign priority benefits under Title 35> United States Code, Sii9 
of any foreign application (s) for patent or inventor's certificate listed below 
and have also identified below any foreign application for patent or inventor's 
certificate having a filing date before chat of the application on which 
priority is claimed: 

Prior Foreign Application (s) : Priority Claimed 

Yes No 

(Number) (Country) (Day/Month/Year) 



I hereby claim the benefit \inder Title 3S, X^ited states Code, §120 of any united 
States application (s) listed below and, insofar as the subject matter of each of 
the claims of this application is not disclosed in the prior United states 
application in the manner provided by the first paragraph of Title 35, United 
States Code, §112, I acknowledge the duty to disclose information material to 
the patentability of this application as defined in Title 37, Code of Federal 
Kegvlations, §1.56 which occurred between the filing date of the prior 
application and the national or PCT inteinational filing date of this 
application; 
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I hereby declare that all statements made herein of my crwn knowledge are true and 
that all statements inade on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, 
under Section lOOl of Title 18 of the United States Code and that such willful 
false statements may jeopardize the validity of the application or any patent 
issued thereon. 

POWER OF ATTOKjHEV: AS a named inventor, I hereby appoint the following attorneys 
and/or agents to prosecute this application and transact all business in the 
Patent and Trademark Office connected therevith. 

John W. Henderson, Jr.. Reg. No. 26, 907; Thomas E- Tyson, Reg, No, 26,543; Robert 
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Joseph C. Redmond, Jr., Reg. No. 18,753; Marilyn S. Dawkins, Reg. 31,140; Andrew 
J. Dillon, Reg. No, 29,634; Kenneth C. Hill, Reg- No, 29,€S0; Kalvin A. Kuim, 
Reg. No, 32,574; Max Ciccarelli, Reg. No. 39,454; Jack V, Musgrove, Reg. No. 
31,986; Daniel E. Venglarik, Reg. No. 39,409; Brian F, Russell, Reg. No. 40,796; 
Philip T, virga, Reg. No. 3S,710; John G, Graham, Reg. No. 19,S^3; Matthew w, 
Baca, Reg. No. 42,277; o\iscin K. Dillon, Reg. No. 42,486; and Antony p. Ng, Reg. 
No. 43,032. 
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Texas 78731, and direct all telephone calls to Andrew J. Dillon, 512/343-6116. 
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AS a below named inveacor, i hereby declare that: 

My residence * post office address and citizenship are as stated below next 
to ray name; 

I believe I am an. original, first and joint inventor of the subject matter 
which is claimed and for which A patent is sought on the invention entitled 

METHOD POR COMPUTING MOPBLS 
BASED XTTRIBOTES SELECT2D BY ENTROPY 

the specitication of vhich (check one3 

x.... is attached hereto. 

was filed on 

as implication Serial No. . , 

and was amended on 

(i*^ applicable) 

I hereby state that I have reviewed and imderstand the contents of the above 
vi identified specification, including the claims, ae amended by any amendment 

=4^' reforred to above. 

m 1 acknowledge the duty to disclose information which is material to the 

J^. patentability of this application in accordance with Title 37, Code of Federal 

^ Regulations, §1,S$. 

I hereby claim foreign priority benefits xjuxder; Title 35, United States Code, §119 
^. of any foreign application (s) for patent or inventor's certificate listed below 

™ and have also identified below any foreign application for patent or inventor's 

yS certificate having a filing date before that of the application on which 

y priority is claimed: 

"Z^ Prior Foreign Application (s) : Priority Claimed 

n , ^ ^ yea Ko 

(Number) (Country) (Day /Month/Year) 



I hereby claim the benefit under Title 35, United States Code, §X20 of any united 
States application (s) listed below and< insofar as the subject matter of each of 
the claims of thie application is not disclosed in the prior United states 
application in the manner provided by the first paragraph of Title 35, United 
States Code, S112, I acknowledge the duty to disclose information material to 
the patentability of this application defined in Title 37, Code of Federal 
Regulations, SI. 56 which occurred between the filing date of the prior 
application and the national or PCT international filing date of this 
application r 
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As a below named inventor, I hereby declare that : 

Hy residence, pose office address and citizenship are as snated below next 
to my name; 

I believe l am an original, first and joint inventor of the subject matter 
which is claimed and for which a patent is sought on the invention entitled 

METHOD FOR COMPUTING MODELS 
BASED ON ATTRIBOTES SELECTED BY E^^TROPY 

the specification of which (check one) 

X is attached hereto. 

was filed on 

as Application Serial No. 

and was amended on 

{if applicable) 

I hereby state that l have reviewed and understand the contents of the above 
identified specification, including the claims, as amended by any amendment 
referred to above. 

I acknowledge the duty to disclose information which is material to the 
patentability of this application in accordance with Title 37, Code of Federal 
Regulations, §1. 56 . 

I hereby claim foreign priority benefits under Title 35, United States Code, §119 
of any foreign application (s) for patent or inventor's certificate listed below 
and have also identified below any foreign application for patent or inventor's 
certificate having a filing date before that of the application on which 
priority is claimed: 

Prior Foreign Application (s) : Priority Claimed 

Yes No 

(Number) {Country) (Day /Month/Year) 



I hereby claim the benefit under Title 35, United States Code, §120 of any tmited 
States application (s) listed below and, insofar as the subject trotter of each of 
the claims of this application is not disclosed in the prior United States 
application in the manner provided by the first paragraph of Title 35, United 
States Code, §112, I acknowledge the duty to disclose information material to 
the patentability of this application as defined in Title 37, code of Federal 
Regulations, §1.56 w^iich occurred between the filing date of the prior 

application and the national or PCT international filing date of this 
application; 
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I hereby declare that all statements made herein of my own knowledge are crue and 
that all statements made on information and belief are believed to be true; and 
further that these statements were made wich che Jcnowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, 
under Secnion lOOl of Title 18 of the United states Code and that such willful 
false statements may jeopardize the validity of the application or any patent 
issued thereon. 

POWER OF ATTORNEY: AS a named inventor^ I hereby appoint the following attorneys 
and/or agents to prosecute this application and transact all business in the 
Patent and Trademark Office connected therewith. 

John W. Henderson, Jr., Reg. No. 26,907; Thomas E. Tyson, Reg. i^c, 23^543; Robert 
M. Carwell, Reg, No, 28>499; Jeffrey S. LaBaw, Reg. No- 31,533; Douglas H. 
Lefeve, Reg, No. 26,193; Casimer K. Salys, Reg. No. 28,900; David A, Mims, Jr., 
Reg. No. 32,708; Richard A. Henkler, Xeg. No. 39,220; Volel Emile, Reg. No. 
39.969; James H, Barksdale, Jr. Reg, No, 24,091; Anthony V, England, Reg. No. 
35,129; Leslie A, van Leeuwen. Reg. No. 42,196; Christopher A. Hughes, Reg. no. 
26,914; Edward A. Pennington. Reg, No, 32,588; John E, Hoel, Reg, No, 26,279; and 
Joseph C- Redmond, Jr, , Reg. No, 18,763; Marilyn S, Dawkins, Reg, 31,140; Andrew 
J. Dillon. Reg. No. 29,634; Kenneth C. Hill/Reg. No, 29,660; Melvin A. Hunn, 
Reg, No, 32,574; Max Ciccarelli, Reg. No, 39,454; Jack V. Musgrove, Reg. No. 
31,986; Daniel E. Venglarik, Reg. No, 39,409; Brian F. Russell, Reg. No. 40,796; 
Philip T. virga, Reg, No. 36,7i0; John G. Graham, Reg. No. 19,S63; Macthew w. 
Baca, Reg. No. 42,277; Justin Dillon, Reg. No. 42,486; and Ancony P. Ng, Reg. 
No. 43,032. 

send correspondence to: Andrev J, Dillon, FSLSMAN, BRADl^Y, GUNTER & DILLON, LLP. 
Lakewood on the Park, Suite 350, 7600B North Capital of Texas Highway, Austin, 
Texas 78731, and direct all telephone calls to Andrew J, Dillon, S12/343-6116 . 



FULL NAME OF SOLE OR FIRST INVSNTOR: Qu^n G. Cung 

INVENTORS SIGNATURE: DATE: 

RESIDENCE : 1260 5 ChittiTn_Circl.e 

Austin, Texaa 78681 

CITIZENSHIP: U.S.A. 



POST OFFICE ADDRESS; 12jS 0_5__Ch.it tim Circle 

Austin. Texas 78681 



FULL NAME OF SECOND INVENTOR: Harry Roeyer Kolar 
INVENTORS S 



I GNATURfe ; Z^**^ /C DATE : 

g6nc 



RESIDENCE: 14145 North 92 nd_ Street.. 



S^co t t_sda 1 e , Ariz ona S 5 0 1 6 



CITIZENSHIP: U..S_..A.. 



POST OFFICE ADDRESS: P. O. Box S992 



S CP 1 1 sda 1 e . Ar i a:ona 8 5 2 5 2 
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DOCKET OTMBER: AT9-99-037 

FULL NAMS OF THIRD INVENTOR: Kevin Eric Norsworthv 

INVENTORS SIGNATURE: DATE: 

RSS IDENCE : 111.01 Ccr(ni,so_JPal.5L Path 

Austin, Texas 75726 

CITIZENSHIP: U.S. A, 

POST OFFICE ADDRESS: IlIOl Comiso Pala Path 

Austin. Texag 79726 

FULL NAME OF SOLE OR FOURTH INVENTOR: CTalio Ortega 

INVENTORS SIGNATURE: DATE: 

RES IDENCE : A121 Durvin Drive 

Th^ COlotiv. _ Texas 7505$ 

CITIZENSHIP: Spain 

POST OFFICE ADDRESS: 4i:5l Durvin Drivfe 

T.Ixe_CoIonv, Texas 7505 6 

FULL NAME OF FIFTH INVENTOR: Frederick J. Scheibl 

INVENTORS SIGNATURE: DATE; , 

RES IDENCE : 14 Rob Rov Road 

Austin > Texas 78746 

CITIZENSHIP: U^- A - 

POST OFFICE ADDRESS: 14 Rob Rov Road 

Austin. .._Texas 7874€ 

FULL NAME OF SIXTH INVENTOR: Vas>:eil^..XorQ5 5 Ian 

INVENTORS SIGNATURE: DATE: 

RESIDENCE: 1202 Oak^OOd Blvd. 

Round Rock, Texas 78661 

CITIZENSHIP; U._S-A- 

POST OFFICE ADDRESS; X 2 P.2_Oa3^wood_B.l.vd ■ 

Round Roc)c, Texas 78681 
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DOCKET NUMBER: AT9-59-037 

FULL NAME OF SEVENTH INVENTOR: Ben P&ter Yuhas 

INVENTORS SIGNATURE :^ DATS:_ 

RES II5ENCE : 121 HavT;.horn Road ^ 

Baltimore. Maryland 21210 

CITIZENSHIP : U.S,A> 

POST OFFICE ADDRESS: 121 Hawthorn _Road 

BaltiTnore^ ^^arvla^d 21210 
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